Towards Hybrid Quality-Oriented Machine Translation
نویسندگان
چکیده
We present a hybrid MT architecture, combining state-of-the-art linguistic processing with advanced stochastic techniques. Grounded in a theoretical reflection on the division of labor between rule-based and probabilistic elements in the MT task, we summarize per-component approaches to ranking, including empirical results when evaluated in isolation. Combining component-internal scores and a number of additional sources of (probabilistic) information, we explore discriminative re-ranking of n-best lists of candidate translations through an eclectic combination of knowledge sources, and provide evaluation results for various configurations. 1 Background—Motivation Machine Translation is back in fashion, with data-driven approaches and specifically Statistical MT (SMT) as the predominant paradigm— both in terms of scientific interest and evaluation results in MT competitions. But (fullyautomated) machine translation remains a hard— if not ultimately impossible—challenge. The task encompasses not only all strata of linguistic description—phonology to discourse—but in the general case requires potentially unlimited knowledge about the actual world and situated language use (Kay, 1980, 1997). Although the majority of commercial MT systems still have large sets of hand-crafted rules at their core (often using techniques first invented in the 1960s and 1970s), MT research in the once mainstream linguistic tradition has become the privilege of a small, faithful minority. Like a growing number of colleagues, we question the long-term value of purely statistical (or data-driven) approaches, both practically and scientifically. Large (parallel) training corpora remain scarce for most languages, and wordand phrase-level alignment continue to be active research topics. Assuming sufficient training material, statistical translation quality still leaves much to be desired; and probabilistic NLP experience in general suggests that one must expect ‘ceiling’ effects on system evolution. Statistical MT research has yet to find a satisfactory role for linguistic analysis; on its own, it does not further our understanding of language. Progress on combining rule-based and datadriven approaches to MT will depend on a sustained stream of state-of-the-art, MT-oriented linguistics research. The Norwegian LOGON initiative capitalizes on linguistic precision for high-quality translation and, accordingly, puts scalable, general-purpose linguistic resources—complemented with advanced stochastic components—at its core. Despite frequent cycles of overly high hopes and subsequent disillusionment, MT in our view is the type of application that may demand knowledge-heavy, ‘deep’ approaches to NLP for its ultimate, longterm success. Much like Riezler & Maxwell III (2006) and Llitjós & Vogel (2007)—being faithful minority members ourselves—we approach a hybrid MT architecture with a semantic transfer backbone as our vantage point. Plurality of approaches to grammatical description, reusability of component parts, and the interplay of linguistic and stochastic processes are among the strong points of the LOGON system. In the following, we provide a brief overview of the LOGON architecture (§ 2) and a bit of theoretical reflection on the role of probability theory
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملField Oriented Control of Dual Mechanical Port Machine for Hybrid Electric Vehicle
A dual mechanical port machine (DMPM) is used as an electrically variable transmission (EVT) in hybrid electric vehicle (HEV). In the conventional HEV, this machine is replaced by a planetary gearbox and two electric machines and makes this structure simpler. This paper presents field oriented control (FOC) for DMPM. For HEV application, drive efficiency and wide operating speed range are impor...
متن کاملFrom Statistical Term Extraction to Hybrid Machine Translation
This study presents a new hybrid approach for translation equivalent selection within a transfer-based machine translation system using an intertwined net of traditional linguistic methods together with statistical techniques. Detailed evaluation reveals that the translation quality can be improved substantially in this way.
متن کاملTowards a Hybrid Rule-based and Statistical Arabic-French Machine Translation System
Arabic is a morphologically rich and complex language, which presents significant challenges for natural language processing and machine translation. In this paper, we describe an ongoing effort to build our first Arabic-French phrase– based machine translation system using the Moses decoder among other linguistic tools. The results show an improvement in the quality of translation and a gain i...
متن کاملTowards Semantic-based Hybrid Machine Translation between Bulgarian and English
The paper focuses on the creation of a semantic-based hybrid Machine Translation system between Bulgarian and English in the domain of Information Technology. The preprocessing strategies are presented. A method for the substitution of English word forms with the synsets or Bulgarian representative lemmas is discussed. Finally, the creation of a factored model in the Moses system is described.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007